Unsupervised Data Partitioning: a Bayesian Approach
نویسندگان
چکیده
A Bayesian-based methodology is presented which automatically penalises over-complex models being tted to unknown data. We show that, with a Gaussian mixture model, the approach is able to select anòptimal' number of components in the model and so partition data sets.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملRefining A Divisive Partitioning Algorithm for Unsupervised Clustering
The Principal Direction Divisive Partitioning (PDDP) algorithm is a fast and scalable clustering algorithm [3]. The basic idea is to recursively split the data set into sub-clusters based on principal direction vectors. However, the PDDP algorithm can yield poor results, especially when cluster structures are not well-separated from one another. Its stopping criterion is based on a heuristic th...
متن کاملUsing Bayesian Blocks to Partition Self-Organizing Maps
Self organizing maps (SOMs) are widely-used for unsupervised classification. For this application, they must be combined with some partitioning scheme that can identify boundaries between distinct regions in the maps they produce. We discuss a novel partitioning scheme for SOMs based on the Bayesian Blocks segmentation algorithm of Scargle [1998]. This algorithm minimizes a cost function to ide...
متن کاملMultivariate Data Grid Models for Supervised and Unsupervised Learning Note technique
This paper introduces a new method to automatically, rapidly and reliably evaluate the class conditional information of any subset of variables in supervised learning. It is based on a partitioning of each input variable, into intervals in the numerical case and into groups of values in the categorical case. The cross-product of the univariate partitions forms a multivariate partition of the in...
متن کاملUnsupervised Coreference Resolution with HyperGraph Partitioning
Unsupervised-learning based coreference resolution obviates the need for annotation of training data. However, unsupervised approaches have traditionally been relying on the use of mention-pair models, which only consider information pertaining to a pair of mentions at a time. In this paper, it is proposed the use of hypergraph partitioning to overcome this limitation. The mentions are modeled ...
متن کامل